Search CORE

12 research outputs found

siEDM: an efficient string index and search algorithm for edit distance with moves

Author: Kuboyama Tetsuji
Nakashima Kenta
Sakamoto Hiroshi
Tabei Yasuo
Takabatake Yoshimasa
Publication venue
Publication date: 01/04/2016
Field of study

Although several self-indexes for highly repetitive text collections exist, developing an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a wide range of potential applications, especially in approximate string retrieval. Despite the importance of computing EDM, there has been no efficient method for indexing and searching large text collections based on the EDM measure. We propose the first algorithm, named string index for edit distance with moves (siEDM), for indexing and searching strings with EDM. The siEDM algorithm builds an index structure by leveraging the idea behind the edit sensitive parsing (ESP), an efficient algorithm enabling approximately computing EDM with guarantees of upper and lower bounds for the exact EDM. siEDM efficiently prunes the space for searching query strings by the proposed method, which enables fast query searches with the same guarantee as ESP. We experimentally tested the ability of siEDM to index and search strings on benchmark datasets, and we showed siEDM's efficiency.Comment: 23 page

arXiv.org e-Print Archive

Directory of Open Access Journals

A Space-Optimal Grammar Compression

Author: I Tomohiro
Sakamoto Hiroshi
Takabatake Yoshimasa
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

A grammar compression is a context-free grammar (CFG) deriving a single string deterministically. For an input string of length N over an alphabet of size sigma, the smallest CFG is O(log N)-approximable in the offline setting and O(log N log^* N)-approximable in the online setting. In addition, an information-theoretic lower bound for representing a CFG in Chomsky normal form of n variables is log (n!/n^sigma) + n + o(n) bits. Although there is an online grammar compression algorithm that directly computes the succinct encoding of its output CFG with O(log N log^* N) approximation guarantee, the problem of optimizing its working space has remained open. We propose a fully-online algorithm that requires the fewest bits of working space asymptotically equal to the lower bound in O(N log log n) compression time. In addition we propose several techniques to boost grammar compression and show their efficiency by computational experiments

Dagstuhl Research Online Publication Server

Phase III placebo-controlled, double-blind, randomized trial of pegfilgrastim to reduce the risk of febrile neutropenia in breast cancer patients receiving docetaxel/cyclophosphamide chemotherapy

Author: A Bosly
A Chan
CL Vogel
D Soong
D Takabatake
DR Gandara
GH Lyman
H Kawabata
HS Han
JC Trent
Kazuo Tamura
M Martín
MS Aapro
N Masuda
Norikazu Masuda
Ryutaro Shimazaki
S Blackwell
SE Jones
Seigo Nakamura
T Masaoka
T Vandenberg
T Younis
TJ Smith
Toshiaki Saeki
Toshimi Takano
Yoshiaki Rai
Yoshimasa Kosaka
Yoshinori Ito
Yutaka Tokuda
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Approximate Frequent Pattern Discovery in Compressed Space

Author: Hiroshi SAKAMOTO
Shouhei FUKUNAGA
Tomohiro I
Yoshimasa TAKABATAKE
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2018
Field of study

Crossref